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We present a solution of the full time-dependent density-functional theory (TDDFT) eigenvalue 
equation in the linear response formalism exhibiting a linear-scaling computational complexity with 
system size, without relying on the simplifying Tamm-Dancoff approximation (TDA). The imple¬ 
mentation relies on representing the occupied and unoccupied subspace with two different sets of 
in situ optimised localised functions, yielding a very compact and efficient representation of the 
transition density matrix of the excitation with the accuracy associated with a systematic basis set. 

The TDDFT eigenvalue equation is solved using a preconditioned conjugate gradient algorithm that 
is very memory-efficient. The algorithm is validated on a small test molecule and a good agreement 
with results obtained from standard quantum chemistry packages is found, with the preconditioner 
yielding a significant improvement in convergence rates. The method developed in this work is 
then used to reproduce experimental results of the absorption spectrum of bacteriochlorophyll in an 
organic solvent, where it is demonstrated that the TDA fails to reproduce the main features of the 
low energy spectrum, while the full TDDFT equation yields results in good qualitative agreement 
with experimental data. Furthermore, the need for explicitly including parts of the solvent into the 
TDDFT calculations is highlighted, making the treatment of large system sizes necessary that are 
well within reach of the capabilities of the algorithm introduced here. Finally, the linear-scaling 
properties of the algorithm are demonstrated by computing the lowest excitation energy of bacte¬ 
riochlorophyll in solution. The largest systems considered in this work are of the same order of 
magnitude as a variety of widely studied pigment-protein complexes, opening up the possibility of 
studying their properties without having to resort to any semiclassical approximations to parts of 
the protein environment. 


I. INTRODUCTION 

The study of optical properties of large complex sys¬ 
tems is of increasing interest in computational biol¬ 
ogy, with most efforts being focused on understanding 
large pigment-protein complexes (PPCs)[TH5]. These 
systems turn up in a variety of different roles in na¬ 
ture, from biosensors to light-harvesting and linker com¬ 
plexes in photosynthetic bacteria and ab-initio compu¬ 
tational studies can play a key role in gaining a deeper 
insight into the mechanisms governing them. However, 
PPCs are generally characterised by the fact that the 
protein environment plays an important role in influenc¬ 
ing the absorption properties of the pigment, creating 
the need for large-scale quantum mechanical calculations 
that are computationally challenging [2H4]. In general, 
time-dependent density-functional theory (TDDFT) [8], 
the time-dependent extension to ground-state density- 
functional theory (DFT)[J., S3: is considered the method 
of choice when treating this class of systems, mainly 
due to the good balance between computational cost 
and achievable accuracy for most common choices of 
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exchange-correlation functionals. 

In recent years there have been a number of devel¬ 
opments in computational algorithms [9UTT1 that have 
helped to make medium-sized systems routinely acces¬ 
sible to TDDFT. However, most common approaches 
to solving the low energy spectrum of a system using 
TDDFT show a computational complexity of at least 
0(N 3 ) with system size, imposing an upper limit on the 
system sizes that can be realistically studied and effec¬ 
tively ruling out a full treatment of the PPCs mentioned 
above. To treat these large biological systems explicitly 
in TDDFT, it is necessary to make use of computational 
approaches that scale linearly with system size. 

TDDFT is generally considered in two different 
flavours. The time domain approach, where the Kohn- 
Sham equations are propagated explicitly in time|12j. 
and the linear response approach[13], where the excita¬ 
tion energies of the system can be recast as the solutions 
to an effective eigenvalue equation[Tl, 35] • The time- 
domain approach can yield the entire spectrum of the 
system via a Fourier transform to the frequency domain, 
is non-perturbative and can thus be applied to problems 
beyond the linear response regime. However, it comes 
with the disadvantage that the Kohn-Sham equations 
have to be propagated sufficiently long to obtain nar¬ 
row line-widths and dark states cannot be resolved. The 
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direct solution of the TDDFT eigenvalue problem on the 
other hand can be used to obtain dark states and triplet 
transitions that are of interest in some photochemical 
processes. For this reason, we focus on the linear response 
flavour of TDDFT for the purpose of this work. In the 
time-domain TDDFT approach, an 0(N ) computational 
effort with system size can be achieved by extending 
linear-scaling techniques known from ground-state DFT 
to the time-dependent Kohn-Sham equations [TfiUTgl . In 
the linear response approach, algorithms capable of solv¬ 
ing for the lowest eigenvalues of the TDDFT eigenvalue 
equation are also known [T5U21j . opening up the possi¬ 
bility of a direct computation of excited states in large 
pigment-protein complexes without relying on additional 
semi-classical approximations. 


The linear response TDDFT equation is a non- 
Hermitian eigenvalue problem, causing it to be difficult 
to solve using standard off-the-shelf eigenvalue solvers. A 
simplifying approximation, known as the Tamm-Dancoff 
approximation (TDA) [221 [25, recasts the problem into 
an Hermitian one but its effect on excitation energies 
and oscillator strengths is not straightforwardly under¬ 
stood. In this work, we introduce a linear-scaling imple¬ 
mentation of full TDDFT in the framework of the DNETEP 
code[24], without relying on the TDA, as was required in 
a previous approach[20]. We test the performance of the 
algorithm on a number of large scale systems and specif¬ 
ically investigate the quality of the full TDDFT eigen¬ 
states to those obtained within the TDA. The largest 
systems considered explicitly in this work are of similar 
size as a number of widely studied PPCs (see for example 
[ 315 ), thus highlighting the capabilities of the algorithm 
developed here to enable a fully ab-initio treatment of 
this class of systems. 


This work is organised as follows: Section [TT] focuses 
on providing a short overview of the theoretical back¬ 
ground necessary for the main results presented in this 
work, with sections |II A| and |II B| introducing the linear 
response formalism to TDDFT, both in the form of a 
matrix eigenvalue equation and an effective variational 
principle. Section |II C| then provides a short summary 
of the DNETEP code in which the algorithm presented is 
implemented, as well as an overview over a solution to 
the Tamm-Dancoff eigenvalue problemfSO). In section III 


the linear-scaling solution to the full TDDFT eigenvalue 
equation is outlined, with a special focus b eing p laced on 
the appropriate choice of preconditioner (III B) for the 
conjugate gradient algorithm. The power of the method¬ 
ology developed here is demonstrated on a small test sys¬ 
tem by comparison to accurate benchmark results (IV AI, 
before moving on to a realistic system of bacteriochloro- 
phyll a in an organic solvent. It is shown that the al¬ 
gorithm developed here scales fully linearly with system 
size and allows the treatment of systems inaccessible by 
conventional approaches. 


II. THEORETICAL BACKGROUND 

In this section we briefly introduce the theoretical 
background of linear response TDDFT. We consider a 
Kohn-Sham system with ground-state density and 
occupied and unoccupied Kohn-Sham states {ip^ } and 
{i/>^ s } respectively, where a denotes a spin index. We 
limit the discussion to isolated systems, such that the 
Kohn-Sham eigenstates can be chosen to be real. Fur¬ 
thermore, only semi-local exchange-correlation function¬ 
als in the adiabatic approximation will be considered, 
thus ignoring any long-range and memory effects. While 
memory effects are routinely ignored in standard TDDFT 
implementations, long-range interactions can be included 
in form of hybrid functionals and are known to yield a 
better description for excitations in infinite systems and 
charge-transfer states, where semi-local exchange corre¬ 
lation functionals are known to fail[T5J US]. However, for 
the purpose of this work the main focus is on excitations 
that retain a localised character and that are thus well 
described by semi-local functionals. 


A. Linear response TDDFT 


In linear response TDDFT, the individual excitation 
energies of the system can be obtained by solving a non- 
Hermitian eigenvalue equation of the form [Til TE, 


A B 
B -A 


(1) 


Here, A and B denote block matrices that can be conve¬ 
niently expanded in a basis of unoccupied and occupied 
eigenstates of the ground state Kohn-Sham system: 


Acva^c'v'cr' — 3 a(7 ' 5 CC ' S vv f i^c'cr' ^ v'a 0 "f 


KS 


cva.cv a 


( 2 ) 
(3) 

where {e^ s } and {e^®} denote the eigenvalues associated 
with the unoccupied and occupied Kohn-Sham states re¬ 
spectively. The eigenvectors are made up of two different 
components X and Y that can be thought of as excita¬ 
tion and de-exitation contributions to the corresponding 
eigenstates. As can be seen from Eq. [l] the effective 
non-Hermitian TDDFT eigenvalue matrix can be char¬ 
acterised by a diagonal part consisting of Kohn-Sham 
transitions between occupied and unoccupied states and 
off-diagonal coupling terms described through the cou¬ 
pling matrix K. The exact form of K depends upon the 
exchange-correlation functional used. Here, we will limit 
our attention to (semi)-local functionals in the adiabatic 
approximation, in which case the matrix elements of K 
can be expressed as 


K, 


cva.c' V a' 


5 2 E k 


= j d 3 rd 3 r'4 K a s (r)C s (r) 
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where E xc is the exchange-correlation energy. In the re¬ 
maining part of this work, all spin indices will be dropped 
for convenience. 

From the structure of Eq. [l] it can be seen that the 
TDDFT eigenvalue equation has solutions in the posi¬ 
tive and negative frequency domain. These positive and 
negative frequencies can be interpreted as excitation and 
de-excitation energies [T5]. Note that due to the struc¬ 
ture of the equation, the positive and negative eigenvalue 
solutions are coupled via the block matrix B. Assuming 
that the coupling of excitations to the de-excitation part 
of the full TDDFT spectrum is small, one can set the 
coupling matrices B to zero, which causes a complete de¬ 
coupling of the excitation and de-excitation part of the 
spectrum. The positive excitation energies can then be 
solved for via the Hermitian eigenvalue equation 

AX = uX. (5) 

Solving Eq. [5] instead of Eq. [ljis referred to as the Tamm- 
Dancoff approximation (TDA) [221 125]. While the TDA 
is often reported to yield reliable excitation energies in 
many situations [23., the structure of the equation vio¬ 
lates both time-reversal symmetry and important sum 
rules related to the oscillator strengths f2<J] of the exci¬ 
tations. Furthermore, there are known cases where the 
TDA yields to significant errors in excitation energies [27], 
making a treatment of the full eigenvalue problem desir¬ 
able. However, as will be discussed in more detail in the 
next section, the main disadvantage of a full treatment 
of the TDDFT eigenvalue equation originates from the 
fact that Eq. [T] constitutes a non-Hermitian eigenvalue 
problem, meaning that a variety of standard numerical 
methods for computing the eigenvalues of large matrices 
cannot be straightforwardly applied to it. 


B. Iterative solutions to the TDDFT equation 

The dimensions of the TDDFT eigenvalue equation 
grow as 0(N 2 ) with system size, making a direct diago- 
nalisation of the matrices in Eq. [l] or Eq. [5] undesirable 
for larger systems. Furthermore, one is often interested 
in a relatively small number of low energy excited states 
in the visible or ultraviolet energy range of a system, 
such that the computation of high energy excited states 
is unnecessary. The best approach to tackle the TDDFT 
eigenvalue problem for real systems of interest is thus to 
use iterative methods in order to calculate the lowest few 
excited states. 

Within the TDA, such an iterative scheme is straight¬ 
forwardly defined, since the Hermitian properties of the 
block matrix A allow for the definition of the lowest ex¬ 
citation of the system via a variational principle: 

^mPn A = min ^TDA (X) = min ■ (6) 

From this definition, the gradient of the Rayleigh-Ritz 
functional Dtda(X) with respect to X can then be 


straightforwardly computed 


9D tda (X) _ _ 2 

— Stda — 


ax 


x f x 


ax-NA^x 

x f x 


■ ( 7 ) 


This gradient can be used as a steepest-descent search 
direction in a conjugate gradient algorithm to optimise a 
random starting vector X guess until the lowest eigenstate 
has been obtained. Note that in order to compute 
the gradient in Eq. [TJ it is only required to evaluate the 
matrix-vector product AX. This means that A does 
not have to be explicitly calculated or stored during the 
calculation, which is prohibitive for large systems. 

The definition of the lowest eigenvalue of the Tamm- 
Dancoff TDDFT equation via the variational principle 
relies on the fact that it constitutes an Hermitian eigen¬ 
value problem and is thus not generally possible for the 
full TDDFT equation. It was however pointed out by 
Thouless [25| that since the blocks A and B of the full 
TDDFT equation are Hermitian, a variational principle 
can be formulated for its lowest positive eigenstate via: 


^min — min 0 T hou(X,Y) 
(XY) 


(x f yn 


= mm 
(XY) 


A B 
B A 


X f X) - (Y T Y)| 


( 8 ) 


The numerator of the above expression is guaranteed 
to be positive semi-definite for (semi)-local exchange- 
functionals, while the denominator is forced to be posi¬ 
tive by taking the absolute value. 

While it is possible to use the above variational prin¬ 
ciple to directly obtain the lowest excited state of the 
full TDDFT equation, the Thouless functional is not 
the most ideal formulation of the problem in the con¬ 
text of semi-local exchange-correlation functionals (see 
Sec. Ill A |. A more computationally efficient reformula¬ 
tion was introduced by Tsiper [25] by noticing the equiv¬ 
alence of the full TDDFT eigenvalue problem to that of 
a set of classical harmonic oscillators. As a first step, 
one introduces the effective vectors p = X — Y and 
q = X+Y. It can then be easily shown that the Thouless 
functional can be rewritten as: 


w mi „ = min fl Ts ip(p, q) 

<p> <i')( A 0 B A “ B ) (;;) 

= S3- |ptq + q<p| -■ <9) 

Here, the variational principle is again only defined for 
the lowest positive excitation of the system. Just like in 
the Thouless functional, the numerator of the functional 
is guaranteed to be positive-semidefinite. The denom¬ 
inator has to be forced to stay positive by taking the 
absolute value, since p^q + q^p is not guaranteed to be 
positive-semidefinite. 
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While the variational principles in Eqns. [6] and [9] are 
written in terms of the lowest excitation of the system 
only, the concept can be straightforwardly extended to 
higher excitations. In order to converge the second-lowest 
excitation u >2 of the system, the same effective functional 
of Eqn. [9]can be used in full TDDFT, with the additional 
constraint that p 2 and q 2 , the trial vectors associated 
with ui 2 obey an effective orthogonality constraint of the 
form 


pIq 2 + qlp 2 


= 0 


( 10 ) 


where p x and q : are taken to denote the vectors associ¬ 
ated with w m in, the lowest excitation of the system. The 
principle can be extended to an arbitrary number of ex¬ 
cited states to be converged, where the vectors p 4 and 
q^ associated with the i th excitation are constrained to 
be orthogonal to the vectors of all other excitations via 
Eqn. [lO] 


C. Tamm-Dancoff TDDFT in the ONETEP code 


The linear-scaling solution to the full TDDFT equation 
developed in this work (see Sec. Ill I is implemented in the 
ONETEP code [H]. As with other linear-scaling DFT ap¬ 
proaches, any reference to individual Kohn-Sham states 
{V^} is given up in favour of a collective representation 
in the form of the single particle density matrix pi”! (r, r') 
such that 


Nocc 

P M (r,r 7 ) = ^ ^* S (r)^ S (r / ) 

V 

= ^i(r)pW^^(r') (ii) 

af3 


where {(p a (r)} denotes a set of in situ optimised [21] 
localised atom-centered support functions referred 
to as non-orthogonal generalised Wannier functions 
(NGWFS)[33] and pi"! is the single particle density ma¬ 
trix in the representation of those NGWFs. Linear scal¬ 
ing of computational cost with system size is then ob¬ 
tained by exploiting the fact that the ground-state den¬ 
sity matrix decays exponentially for any system with a 
band gap, causing pi 1 '! to be sparse for sufficiently large 
system size(34, 35j . 

The in situ optimisation of the support functions {(f> a } 
means that only a minimal number of functions is re¬ 
quired to accurately span the occupied subspace, but the 
unoccupied subspace is generally badly represented [30] . 
This issue is overcome by optimising a second set of NG¬ 
WFs {Xa} for a low energy subset of the unoccupied sub¬ 
space that is represented by the effective density matrix 
P {c} [SH- It has been demonstrated [2D] that the compact 
sets of support functions {4>a} and {xp} provide a very 
good representation for low energy excited states as cal¬ 
culated in the TDA. Defining an effective response den¬ 
sity pi 1 ! (r) associated with a TDA eigenvector X such 


that 


p {1} (r) 


5> c KS (r)X c ^ KS (r) 

C,V 

J2xa(r)P {1}ap M^) 

ot/3 


( 12 ) 


it becomes clear that the effective response density ma¬ 
trix pi 1 ! is the representation of X in mixed unoccupied- 
occupied NGWF spacej20L The matrix-vector product 
f = AX can then be directly constructed in NGWF space 
as 


f TDA = P {c} H x P {1} — p {1} H^P {u} 


+P 


(4 


(v {1 

y v sc 


}x<t> 

SCF 


pf 1 } ^ p(4 


(13) 


Here, H x and H e ' denote the ground state Kohn-Sham 
Hamiltonian in the {x Q } and {4>p} representation respec¬ 


tively. 


pf 1 } 


is the self-consistent field response 


of the system due to a perturbation pi 1 ! (r) in the ground 
state density [2D] and is the result of X acting on the cou¬ 
pling matrix K in mixed unoccupied-occupied NGWF 
space. Note that f^DA represents a contravariant tensor 
quantity and has to be multiplied by S x and S^ from the 
left and right respectively to obtain a covariant quantity 
(see m for further details). The lowest excitation energy 
of the system can then be written as 


k^min 111111 - -II) A 
p{i> 


.{H 


Tr 


»{!}t gYfX0g0 


Tr 


pDHgxpDIg^ 


(14) 

where S x and denote the overlap matrices of the {x Q } 
and {4>p} NGWF representation respectively. Higher ex¬ 
cited states can be obtained from the same variational 
principle by enforcing an orthogonality constraint be¬ 
tween all excited states. If all involved density matri¬ 
ces pi 1 !, pT"! an d pf c } can be treated as sparse for 
sufficiently large system size [52], evaluating Eqs. 13 and 
14 scales as O(N) with system size and can be 

computed in linear-scaling effort using standard iterative 
approaches. 

The above formulation yields accurate excitation ener¬ 
gies if X is well-represented by the low energy subset of 
unoccupied states for which pl c ! is optimised. However, 
in many scenarios it is desirable to include higher energy 
unoccupied states in an approximate manner to achieve 
convergence [2D] [31]. One straightforward way of doing 
so is to introduce the joint unoccupied-occupied repre¬ 
sentation {tpa} = {< t>/ 3 } © {x-tI and t° redefine pl°! via 
a projector onto the entire unoccupied subspace repre¬ 
sentable by {y^}[3lj: 

p{°l = (S^) -1 - (S^) -1 S^P M S^ (S v ) _1 . (15) 


Here, the elements of S^ are given by = {‘Pal&p)- 
While using the joint representation for the unoccupied 
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subspace instead of {xa} roughly doubles the computa¬ 
tional cost compared to using {Xa}, it yields consistently 
good TDDFT excitation energies [5T| f5^] and will be used 
throughout in Sec. [IVJ However, for the purpose of clar¬ 
ity in outlining the linear-scaling TDDFT algorithm in 
Sec. |III| we shall use {Xa} to label the unoccupied space, 
noting that, if desired, the representation can be replaced 


by {ipp} and the projector of Eq. 15 


III. FULL TDDFT IN ONETEP 

In this section, we will outline a conjugate gradient al¬ 
gorithm to compute the lowest N u excited states of the 
full TDDFT equation. While other iterative eigensolvers 
like the Lanczos and Davidson algorithms and multishift 
methods have been applied to this problem, both in the 
framework of standard cubic scaling[9] [TT] and CFAQ [Hi] 
approaches, the conjugate gradient method is chosen here 
for both its good performance in the TDAJ2IJ and its 
low memory requirements suitable for large scale appli¬ 
cations. A special focus will be put on an effective pre¬ 
conditioning scheme, as well as the linear-scaling prop¬ 
erties of the algorithm. As in the previous section, the 
discussion is limited to semi-local exchange-correlation 
functionals only. 


A. The Tsiper functional in mixed {Xa}~{0/3} 
NGWF space 


The key to obtaining a linear-scaling implementation 
of the full TDDFT equation is to rewrite the Tsiper func¬ 
tional in the same compact, localised NGWF represen¬ 
tation that has been used to rewrite the Tamm-Dancoff 
functional in Eq. 


14 


For this purpose, we define f Tsip as 


(A-B 0 \ fp\ _ ( Ap - Bp\ 

'v 0 A + By yqy ~~ yAq + Bqy 

(16) 

Following analogous steps to the derivation of the linear- 
scaling solution to the TDA eigenvalue equation PU], we 
define the effective response density matrices P^ and 
and response densities p^ (r) and p^ (r) such that 



pM( r ) 


X^ S ( r )P™^ S (r) 

C,V 

5>«(r)P {p} “%(r) 

a/3 


(17) 


with p{ q } (r) following an analogous definition. Thus P^ 
and P^ are the matrices p and q in {Xa}-{'Aa} NGWF 
representation. Just like P^|20|. P^ and P^ have 
to follow an effective invariance constraint in order to be 
valid response density matrices, which originates from 
the orthogonality between the unoccupied and occupied 


Kohn-Sham spaces in which p and q are represented. For 
P^ p }, the invariance constraint can be written as 


>{p}' _ pDlgxpMg^pM _ pM 


(18) 


with an identical statement for P^. 

From Sec. |II C[ the action of A acting on some vector 
X written in {Xa}-{0/3} NGWF space is already known. 
Using Eq. 13 it is straightforward to rewrite f Tsip in {x' Q } 


and {4>p} representation such that 


cx<i> _ 

±Tsip — 


'f$ } \ - /P {c} H x P {p} - pW^pW 

f x<t> J ~~ ^p( c l jjxpl 1 ?} _ p{g}jj0p{<d 


2P {c} f V, 


0 

{i} x<l> 
SCF 


>M 


>M 


(19) 


The advantage of the reformulation of the Tsiper func¬ 
tional in terms of p and q now becomes apparent. In 


order to evaluate Eq. 19 for any semi-local exchange- 


correlation functional, it is sufficient to evaluate 

only once. Since calculating Vg^p ^ is generally the most 
expensive part of applying the TDDFT operator, it fol¬ 
lows that computing Eq. is not significantly more ex¬ 
pensive than evaluating Eq. 13j suggesting that a full so¬ 
lution of to the TDDFT equations can be of similar com¬ 
putational complexity to the solution to the TDA{3Bj. 

Using Eq. [l9j the lowest excitation energy of the sys¬ 
tem as specified by the Tsiper functional ©> can then be 
rewritten in {Xa} and {4>p} representation 


Tr 


|p{p} p{q}| I 2 Tr 


pfpltgXfW g <t> 
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p{p}tgxpl«}g</ 


Tr 


+ 


2 Tr 


p{<?}t gXfX'/’ g 

M 


pfpltgYpMg 



( 20 ) 


where the minimisation is carried out under the normal¬ 
isation constraint 


Tr 


p{p}tg\p{9}g0 


= 1. 


( 21 ) 


Specifying the normalisation constraint allows us to drop 
the absolute value from the denominator of the Tsiper 
functional when computing the gradient of Eq. 1201 w ith 

the 
be, 
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respect to changes in (P^ p ^ P^). Using Eq. 
contravariant gradient of the Tsiper functional can 
in close analogy to the gradient in the TDA ([7]), written 


as 



p{p}tgXfX0 a4 

M 


( 22 ) 


This result exploits the fact that P^ and P^ follow 
the normalisation constraint. The above gradient can 































6 


be used as a steepest-descent search direction in a con¬ 
jugate gradient algorithm[37], Note however, that since 
the TDDFT operator in the TDA can be computed in 
linear-scaling effort j20j, the evaluation of both f^ ip and 
®Tsip a ^ so sca ^ es fully linearly with system size, as long 

as all involved density matrices pl c f, P^, pf p f and 
pl 9 f can be treated as sparse for sufficiently large sys¬ 
tem sizes. Furthermore, it is worth pointing out that for 
any matrix pair pl p f and pl^f obeying the invariance 
constraint of Eq. jl8j the gradient g^ ip follows the invari¬ 
ance constraint by construction. This condition is vital 
as it means that any pair of trial response density ma¬ 
trices updated with a search direction derived from 
will also obey the appropriate invariance constraint by 
construction. 

It should be noted, that once a truncation of the den¬ 
sity matrices P^ and pl g l is introduced, the invari¬ 
ance constraint of Eqn. [18] can only hold approximately. 
While it is possible to iteratively apply the projection 
at the end of each conjugate gradient step until some 
measure of the violation of the invariance constraint is 
kept below a certain threshold[2Dj [32], in practice this 
can lead to convergence problems as it destroys the varia¬ 
tional nature of the algorithm presented here. This prob¬ 
lem is overcome by introducing a set of auxiliary density 
matrices l/ 9 f and 32j in the spirit of ground state 
linear-scaling approaches [551. The auxiliary matrices can 
be arbitrarily truncated and are used to define the real 
density matrices pf ? f and P^ used in the algorithm via 

P M = P to S x L {p} S *pW (23) 

at every step of the calculation. While this scheme comes 
at a computational cost as pf p f is less sparse than l/ p f, 
it guarantees that every point in the algorithm, P^ and 
pf ? f fulfil their respective invariance constraints to the 
degree that P^ and pf c f fulfil their respective idempo- 
tency constraint [32j. Since linear-scaling DFT calcula¬ 
tions employ a number of techniques that ensure that for 
sensible truncation schemes, pf u f and pf c f retain near- 
idempotencypJS], the above scheme yields a robust con¬ 
vergence of TDDFT calculations when truncations are 
applied to the respective response density matrices. 


B. Preconditioning 

The TDDFT eigenvalue problem generally has a large 
condition number associated with it, causing iterative 
eigensolvers to show a relatively slow convergence. This 
is most easily appreciated by considering that the ele¬ 
ments of the coupling matrix K are generally small com¬ 
pared to the diagonal elements of Kohn-Sham eigenvalue 
differences and the condition number of the full TDDFT 
matrix is reasonably well approximated by the condition 

number of , where the elements of the block ma¬ 


trix D are given by 

D cv ,c'v' = <W<W(e^ S - e^ S )- (24) 

Clearly, D has a condition number that is much larger 
than 1, resulting in relatively slow convergence of itera¬ 
tive eigensolvers. 

For these reasons, it has long been appreciated that 

J should form an efficient preconditioner for the 

full TDDFT eigenvalue problem. However, applying the 
preconditioner requires the computation of D^ 1 , which 
can only easily be constructed in a Kohn-Sham eigenstate 
representation where D is diagonal. In linear-scaling 
TDDFT, a diagonal representation of D is not available 
and the explicit construction of D 1 via matrix inversion 
is undesirable. 

In order to obtain a linear-scaling preconditioner, we 
consider the Tsiper functional of Eq. [9] when precondi¬ 
tioned from the left: 


/ D 0 
^0 D 


a 


Tsi P (p,q) = 


D 


(p f q f ) 


(A-B) 

0 


D” 1 (A + B) 


(P f q f ) 


0 

D _1 


D 



(25) 


While the action of D 1 on a matrix cannot be straight¬ 
forwardly constructed in {Xa}-{<As} NGWF space, the 
action of D is trivially known. Denoting G^ ip = 


/ G xe ^ > 

{ p} 

G 

of Eq. 22 it can be seen that applying the pre¬ 
conditioner D 1 to the Tsiper functional is equivalent to 


as the preconditioned version of the gradient 

{-3}/ 


solving the linear system 


'P {c >H*Gg } 

v p {c} h^g£ 0 } 


G*jH^pW J 


s x<f> 

^Tsip 


(26) 


for G^g ip . This linear system can be solved iteratively 
to a chosen degree of numerical accuracy using a stan¬ 
dard conjugate gradient algorithm (see algorithm 2 de¬ 
tailed in my While applying the preconditioner in 
NGWF space is therefore not as straightforward as in 
Kohn-Sham space, it scales fully linearly with system 
size and only requires a number of comparatively inex¬ 
pensive matrix-matrix multiplications. If the computa¬ 
tional overhead from solving the linear system in every 
step of the conjugate gradient calculation is significantly 
less than the time saved by constructing fewer times 
due to a faster convergence rate, the preconditioning pro¬ 
posed here becomes highly efficient. This point will be 
addressed in more detail in Sec. IIV Al 
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C. Optimising multiple excited states 


In most situations, we are not interested in converg¬ 
ing only the lowest excited state of the full TDDFT 
equation, but rather the subspace spanning the lowest 
N u j excitations {u>i}. Following the considerations in 
Sec. II B| regarding the convergence of higher excited 
states, we introduce a set of N u TDDFT trial vectors 
{pTsi p ; i _ . JV w } ) where 


pTsip _ 



(27) 


Differentiating the above expression with respect to 
Pj sip , it is possible to construct a contravariant gradi¬ 
ent that is orthogonal to all current subspace 

vectors {P^ slp }- The gradient can be written as 




In order to span the appropriate subspace, {pJ sl Pj is 
required to follow an orthonormality condition that is 
written as 


1 

2 



pipit gxpRIgb 


Tr 


pipit g,\p{<?} g<£ 


= Si 


(28) 

In the algorithm presented here, in close analogy to the 
way the Tamm-Dancoff eigenvalue problem is treated (20]. 
the orthonormality condition is enforced using a Gram- 
Schmidt procedure. However, some additional care has 


to be taken, since the quantity Tr 


pipit gxpDIg^ 


is 


not required to be positive-semidefinite. Following the 
convention established in Ref. [T9], prior to orthogonal- 


ising the set {P^ S1P }j Tr 


pMtgYpM g 4 


is computed 


for all i and, if found negative, the vector Pj Mp is trans¬ 
formed according to 



(29) 


Once an orthonormal set {P j lsip ; % = 


^ Sip ; * = 

has been obtained, it can be used to construct 
{(?Tsip)b * = from Eq. 19 It is then possi¬ 

ble to write down an effective functional for the sum of 
the lowest N u excitations of the system, such that 


min 1 } 
{pTsip} 


({&™ p }) = £ 


= N u.; 


(30) 


where 


n 


({E™ p }) = t 


N w 
Tsip l|i-: 


Tr 

pMIgX 

te) 

i 

2 

Tr 

pMfg 

xpDIgb 


Tr 


+ 


RltgX (fX4 ^ S 0 


MJi' 


Tr 


>{p}tgxp{?}g^ 


>(.31) 


and the minimisation in Eqn. [30] is to be carried out 
under the effective orthonormality constraint placed on 



The fact that is orthogonal to all current sub¬ 

space vectors {p^ 15113 } is crucial for the correct perfor¬ 
mance of the algorithm. It ensures that when using 

(®Tsi P ) as a s t ee P es t descent direction to update P^ 13113 
such that 



for some given line step A, there is no violation of the 
orthonormality constraint of Eq. [28] to first order in A. 

The above outline contains all of the basic ingredients 
to construct a preconditioned conjugate gradient algo¬ 
rithm capable of solving for the lowest TDDFT ex¬ 
citations of the system in linear-scaling effort. The ex¬ 
act algorithm used in the implementation discussed in 
this work is adapted from [3j|| for the purpose of solving 
the full TDDFT eigenvalue problem (See algorithm 1 in 
E3). Here we limit ourselves to the comment that while 
minimising the Tsiper functional of Eq. [3l] yields a set 
of vectors spanning the same subspace as the TDDFT 
eigenvectors corresponding to the lowest N u eigenvalues, 
these eigenvalues can only be obtained through a sub¬ 
space diagonalisation that scales as 0(N%). However, 
since N u is generally taken to be small, this single cubic 
scaling step is not considered to be a bottleneck in any 
practical calculation. The subspace diagonalisation can 
be carried out by diagonalising the N u x N u dimensional 
matrix Q with matrix elements 


Qij 


Tr 


pIPltgY 




+ Tr 
2 




(34) 

While the subspace diagonalisation only has to be car¬ 
ried out at the end of the calculation, the Gram-Schmidt 
orthonormalisation has an 0(N%) scaling associated with 
it and thus, like the TDA implementation[2D], the algo¬ 
rithm only shows a linear-scaling with system size for 
a constant number of excitations. However, we point 
out that for many systems, especially the large pigment- 
protein complexes mentioned as the main focus of this 
work, low energy absorption properties are dominated 
by a small number of excitations of interest that stays 
constant with system size and for this class of systems, 











































the method presented here allows for truly linear-scaling 
calculations. 


IV. RESULTS 


In this section we will present some of the strengths of 
the algorithm developed in Sec. m and specifically ad¬ 
dress the question of whether full TDDFT produces sig¬ 
nificantly more accurate results compared to the TDA 
for a certain class of systems. We will first focus on 
trans-azobenzene, a molecule small enough to be easily 
treatable with standard cubic-scaling implementations of 
TDDFT. After comparing the linear-scaling TDDFT ap¬ 
proach presented here to these benchmark calculations 
we will discuss the importance of preconditioning the full 
TDDFT eigenvalue equation in order to speed up conver¬ 
gence. Section flY B| then aims at reproducing experimen¬ 
tal results of bacteriochlorophyll a in an organic solvent, 
addressing again the question of whether full TDDFT 
provides a significant advantage over the TDA in this sys¬ 
tem, as well as the influence of an explicit treatment of 
solvent molecules on the spectrum. Finally, it is demon¬ 
strated that the algorithm is capable of obtaining low 
energy excitations in linear-scaling effort by computing 
the excitations in system sizes of up to ss 7000 atoms. 

All calculations performed in this section are done 
using the PBE functional [JO]. Norm-conserving 

pesudopotentials[H], as well as the projection operator 
and the joint NGWF set of Eq. H5 are used throughout 
for the ONETEP TDDFT calculations. 


A. Trans -azobenzene 

As a first test system, we choose trans -azobenzene 
(C 12 H 10 N 2 ) as its moderate size allows for detailed 
benchmark comparisons to conventional TDDFT imple¬ 
mentations showing an 0(N 3 ) scaling. Furthermore, the 
optical spectrum of this system has already been stud¬ 
ied to a high degree of accuracy using the GW approx¬ 
imation and the Bethe-Salpeter equation (BSE), where 
it was found that the TDA generated considerable er¬ 
rors compared to a full solution to the Bethe-Salpeter 
equation [27| . While no straightforward comparison can 
be drawn between the GW+BSE and TDDFT, the sim¬ 
ilar structure of the equations leads us to expect that 
TDDFT in the TDA and full TDDFT will also yield sig¬ 
nificantly different results for this system, making it an 
ideal test case. 

First the ionic positions of trans-azobenzene are 
optimisedj42] in ONETEP, after which the lowest eight 
excitations are computed both in the TDA and in full 
TDDFT. The TDA results are obtained using the al¬ 
gorithm introduced in Ref. [20], while the full TDDFT 
results are computed using the algorithm introduced in 
this work. Calculations are performed using a kinetic en¬ 
ergy cutoff of 1000 eV and a box size of 56.69 x 56.69 x 



ONETEP TDA 

NWChem TDA 

ONETEP RPA 

NWChem RPA 

1 

2.233 

2.192 

2.184 

2.149 

2 

3.520(0.047) 

3.524(0.091) 

3.516(0.286) 

3.518(0.186) 

3 

3.536(0.001) 

3.546(0.001) 

3.489(0.002) 

3.499(0.003) 

4 

3.720(1.094) 

3.681(1.025) 

3.408(0.499) 

3.379(0.751) 

5 

3.822 

3.866 

3.821 

3.865 

6 

3.875(0.003) 

3.923(0.001) 

3.874 

3.922 

7 

4.234(0.001) 

4.268(0.001) 

4.229(0.001) 

4.262(0.001) 

8 

4.315 

4.305 

4.241 

4.230 


TABLE I: Lowest eight excited states of azobenzene, as com¬ 
puted with ONETEP and NWChem. Energies are given in eV, 
oscillator strengths are shown in brackets. States without 
specified oscillator strengths are dark. Where necessary, the 
states have been reordered according to their character, such 
that the order is the same as for the ONETEP TDA results. For 
the NWChem calculations, an aug-cc-pVTZ Gaussian basis 
set is used. Here, TDA denotes the Tamm-Dancoff approx¬ 
imation, while RPA is used to denote a solution to the full 
TDDFT equation. 


56.69 A 3 . In order to avoid any interaction between pe¬ 
riodic images, the TDDFT calculations are carried out 
in open boundary conditions [431 . A minimal set of one 
NGWF per H and four NGWFs for each C and N atom 
is used both for {Xa} and {4>p}. A localisation radius of 
10 oo is applied to {(j>p} representing the occupied sub¬ 
space, while the localisation is relaxed to 13 Qq for {x a } 
in order to better represent the more delocalised unoc¬ 
cupied states. In some molecules, low energy excitations 
can be found that are of very delocalised Rydberg-state 
character and provide a challenge for localised basis set 
representations. However, tests performed using TDA 
have shown that these states can be systematically con¬ 
verged by increasing localisation radius of {y a }, and con¬ 
verged values are generally found to be in good agreement 
with results obtained from real-space methods [201. For 
the purpose of this work, we have performed a number 
of convergence tests, increasing the localisation radius to 
18 ao and found the lowest eight excitations of alizarin 
to be well converged at the localisation radius of 13 ao- 

The TDDFT results from ONETEP are compared to re¬ 
sults obtained using NWChem [44]. The NWChem calcu¬ 
lations are performed using the same ionic positions as in 
ONETEP and an aug-cc-pVTZ Gaussian basis set contain¬ 
ing diffuse functions that are designed to yield a good 
description of weakly bound unoccupied states [45]. In 
order to make the all-electron calculations more compa¬ 
rable to the pseudopotential calculations in ONETEP, the 
Kohn-Sham states corresponding to the core electrons 
are excluded from the occupied subspace when calculat¬ 
ing the TDDFT excitation energies. 

The results for the lowest eight excitations for the TDA 
and full TDDFT as calculated using ONETEP in compari¬ 
son to the NWChem results can be found in Table U As 
can be seen, there is a generally good agreement between 
the ONETEP and the NWChem results, both in terms of 
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FIG. 1: Convergence of the 2 lowest states of Azobenzene 
for different degrees of preconditioning applied. “Precond 
etoi” describes the tolerance to which the linear system (261 
is solved in order to apply the preconditioning. 


the excitation energy and the oscillator strength. The 
largest discrepancy is found in the the sixth excited state, 
with the ONETEP results being 48 meV lower in energy for 
both the TDA and full TDDFT. Most other results show 
a significantly smaller discrepancy. 

Generally, the full TDDFT results compare very sim¬ 
ilarly to the NWChem benchmark as the TDA results, 
showing that the algorithm indeed performs correctly. 
Remaining discrepancies are most likely due to the all¬ 
electron treatment in NWChem compared to the pseu¬ 
dopotential treatment in ONETEP, as well as basis set dif¬ 
ferences (see Ref. (20] for a detailed comparison between 
NWChem and ONETEP regarding TDA results). It is thus 
evident that the minimal NGWF representation is ca¬ 
pable of obtaining excitation energies of a comparable 
quality to those from considerably larger Gaussian basis 
set representations, highlighting the advantages of us¬ 
ing an in situ optimised representation when performing 
TDDFT calculations. 

Regarding the impact of full TDDFT on the low energy 
excited states of trans- azobenzene, it becomes apparent 
that when solving the full TDDFT equations, the domi¬ 
nant low energy excited state, which is mainly made up 
of a HOMO-1—*LUMO transition, decreases in energy by 
more than 0.31 eV as compared with the TDA results. 
The peak with the second highest oscillator strength, 
corresponding to a HOMO-2—»LUMO transition, stays 
almost constant in energy. In the full TDDFT, a signifi¬ 
cant part of spectral weight is shifted from the dominant 
peak to the second peak. This result is in good qualita¬ 
tive agreement with the GVF+BSE results, where it was 
found that the TDA blue-shifts the dominant peak by 
0.2 eV and overestimates its oscillator strength[271. It is 
thus clear that solving the full TDDFT equation instead 
of the TDA can lead to significant changes in the com¬ 
puted optical spectrum, with bright peaks being shifted 
by tenths of electronvolts and a significant redistribution 


of spectral weight occouring. 


The small size of trans-azobenzene makes it an ideal 
system verify the effectiveness of the preconditioning in¬ 
troduced in this work. For this purpose, the we compute 
the two lowest excited states of the system for a number 
of different convergence tolerances e to i of the iterative 
preconditioner [37], as well as the case when no precon¬ 
ditioning is applied to the conjugate gradient algorithm. 
The convergence rate of the two lowest excited states 
with respect to different levels of preconditioning applied 
can be found in Fig. [T] Note that iteratively applying 
the preconditioner to a tolerance of 1(R 4 cuts the num¬ 
ber of iterations needed to reach convergence by almost 
a factor of four compared to the case where no precon¬ 
ditioning is applied. This highlights the ill-conditioning 
of the TDDFT eigenvalue equation mentioned in section 
|IIIB| and shows that preconditioning is vital in achieving 
good convergence rates. However, note that even when 
the iterative preconditioning tolerance is set as high as 
1(R 2 , corresponding to only applying the preconditioner 
approximately at each iteration, the number of iterations 
needed to reach convergence is decreased by a factor of 
two. This finding is vital as solving the linear system of 
Eq. 26 iteratively in each conjugate gradient step has a 
computational overhead associated with it, which can be 
minimised if the solution is only obtained approximately 
rather than to a high degree of numerical accuracy. It 
should be pointed out however, that the computational 
overhead with associated with the preconditioning is neg¬ 
ligible for the system at hand, making up 0.17% and 
0.53% of the total calculation time for e to i = 10 -2 and 
etoi = lO -4 , which can again be attributed to the com¬ 
pact size of the NGWF representation. Furthermore, it 
is found that the total calculation time is reduced by a 
factor of 2.86 when comparing the unconditioned system 
to a preconditioned system with etoi = 10 —4 , thus show¬ 
ing that the preconditioner introduced here indeed leads 
to significant reductions in computational effort. 


We note that the convergence of the preconditioned 
system with the high convergence tolerance of e to i = ICR 2 
closely follows the fast convergence of the tight tolerance 
results of etoi = HR 8 for the first ten iterations. This 
suggests that the convergence tolerance of the precon¬ 
ditioner can be chosen adaptively. Starting off with a 
high tolerance for the first few iterations and tighten¬ 
ing it closer to convergence has the potential to provide 
the ideal balance between reducing the computational 
overhead of the preconditioning and increasing the con¬ 
vergence rate. In conclusion it is demonstrated that the 
preconditioned conjugate gradient algorithm introduced 
in this work yields a very good agreement with exist¬ 
ing TDDFT implementations and shows excellent con¬ 
vergence rates. 
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FIG. 2: Structure of bacteriochlorophyll a, as obtained from a 
geometry optimisation in vacuum (left) and a single snapshot 
from an MD simulation in toluene (right). This figure was 
created using VMD[H|. 


B. Bacteriochlorophyll 


We now shift the focus to bacteriochlorophyll a 
(MgN 4 C> 6 C 55 H 74 ), a chromophore that is of great in¬ 
terest in computational biology due to its role in light¬ 
harvesting complexes [THSj. Bacteriochlorophyll (BChl) 
is a medium-sized system still within the range of con¬ 
ventional cubic scaling TDDFT implementations. There¬ 
fore, the focus in this section is not to demonstrate the 
linear-scaling capabilities of the algorithm developed in 
this work but rather to address the question of whether 
full TDDFT yields a better description of the low energy 
absorption spectrum than the TDA when compared to 
experimental results. 

All TDDFT calculations performed in this section are 
carried out using a minimal set of NGWFs for all atomic 
species involved and a localisation radius of 10 a 0 and 
13 ao for the NGWFs representing the occupied and un¬ 
occupied space respectively. A kinetic energy cutoff of 
1020 eV is used in all calculations. 


We aim to compare to the experimental results for the 
low energy absorption spectrum of BChl a in a toluene 
organic solvent [46} [47j. As a first step, we take the atomic 
positions of BChl a site 1 in the Fenna-Matthews-Olson 
complex (see Ref. [3] for an explanation of how the input 
structure was obtained from X-ray diffraction results), 
optimise the atomic positions in vacuum and then cal¬ 
culate the TDA and full TDDFT spectra of the system 
within an implicit solvent model [38], where the static di¬ 
electric function e is chosen to be 2.38 in order to match 
the dielectric function of toluene at room temperature. 
The final structure of BChl a in vacuum can be found in 
Fig - E As can be seen, the porphine ring is entirely flat in 
this configuration, while the alkane tail folds underneath 
the ring structure. 

The results of the TDDFT calculation as compared 
with the experimental results [53] can be found in Fig. 3a 


As can be seen the experimental spectrum shows three 
main features: A main absorption peak at around 1.6 eV, 
a shoulder between 1.65 and 1.75 eV and a second peak 
at around 2.1 eV. The first and second peaks are com¬ 
monly referred to as the Qy and Qx transitions respec¬ 
tively and can be characterised as HOMO—>LUMO and 
HOMO-1—)-LUMO transitions in a single-particle picture. 


As can be seen from Fig. 3a the TDA results generate 
only a single main peak at 2.12 eV that is of Qy char¬ 
acter. This main peak shows a shoulder at 2.05 eV that 
is mainly of HOMO-1—>TUMO and HOMO-2—>-LUMO 
character. The full TDDFT spectrum on the other hand 
produces a Qy peak at 1.80 eV that is the lowest ex¬ 
citation of the system, as well as a second peak of Qx 
character at 1.98 eV, but fails to reproduce a shoulder to 
the Qy peak. It also shows a third peak with small os¬ 
cillator strength at 2.11 eV that is of HOMO-2—)-LUMO 
character. 


It can therefore be concluded that the TDA fails in 
correctly reproducing the absorption spectrum of BChl a 
in toluene. Not only is the Qy transition overestimated 
by 0.47 eV compared to the experimental results, it does 
not correspond to the lowest excitation of the system and 
there is no clean Qx transition. The full TDDFT results 
show a considerable improvement. While the Qy transi¬ 
tion is still overestimated by 0.2 eV, it now corresponds 
to the lowest excitation of the system and there is a con¬ 
siderable splitting between the Qy and Qx transitions. 
However, the full TDDFT results underestimate the en¬ 
ergy of the Qx transition by around 0.14 eV compared 
to experimental results and fail to exhibit a shoulder to 
the Qy transition. 

The origin of some of the failures of the TDA can 
be traced by breaking down the excitations into indi¬ 
vidual Kohn-Sham transitions. In full TDDFT, the Qy 
transition is almost exclusively (to 95%, as compared to 
60% in the TDA) a transition between the HOMO and 
the LUMO. In Bacteriochlorophyll, this transition has a 
strong dipole moment associated with it, which in turn 
causes Vg^ F to be large and the TDDFT energy to show 
a large increase compared to the HOMO-LUMO energy 
difference. In full TDDFT, this large dipole is screened 
by the de-excitation vector Y which is almost entirely 
made up of the same HOMO-LUMO transition, thus sig¬ 
nificantly lowering both the excitation energy and the 
oscillator strength. In the TDA, Y = 0 and instead the 
large dipole moment of the HOMO-LUMO transition is 
screened by mixing in smaller fractions of higher energy 
transitions, including fractions of Qx transition. This 
causes the Qy transition to have a significantly higher 
energy and larger oscillator strength in the TDA and also 
contributes to the absence of a clean Qx transition. 

Since optimising the atomic positions of BChl a in vac¬ 
uum might have lead to a structure that is unrealistic for 
the system solvated in toluene, it is not clear how much of 
the failure of full TDDFT to reproduce the experimental 
results is due to the choice of exchange-correlation func¬ 
tional used in this work. In order to obtain a more real¬ 
istic structure of BChl a in toluene, we make use of the 
classical molecular dynamics package AMBLI{|491. We 
solvate BChl a in 704 molecules of toluene (correspond¬ 
ing to 10,700 atoms for the total system) and equilibrate 
the system to 300 K and a pressure of 1 atm, followed by 
a 300 ps simulation in the NVE ensemble. From this MD 
run we take 8 snapshots 10 ps apart that are then used 
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FIG. 3: Absorption spectra of bacteriochlorophyll a as calcu¬ 
lated with the TDA and full TDDFT, both using the opti¬ 
mised vacuum structure and the 8 MD snapshots in toluene. 
Fig. [3a] shows the spectrum using the vacuum structure and 
an implicit solvent model only, while Fig. [3b] shows the aver¬ 
aged spectrum of 8 MD snapshots in 15 A of explicit toluene. 
Fig. [3c| shows a comparison between the averaged spectrum 
in 15 A^ of explicit toluene and the averaged spectrum where 
the toluene is replaced by an implicit solvent. A Lorentzian 
broadening of 0.025 eV is used for the TDDFT results and 
the experimental data is scaled such that the peak height of 
the Qy transition agrees with that of the Qy peak obtained 
from full TDDFT in Fig. [3a] 


as input atomic positions in the TDDFT calculations. In 
order to include solvent effects at a quantum mechanical 
level, we explicitly include all toluene molecules within a 
15 A radius from the Mg atom in the calculation, while 
representing the rest of the solvent by an implicit solva¬ 
tion model. This process yields a system size of 700-800 
atoms depending on the snapshot, which is closer to the 
limit of sizes that can be treated by conventional 0(N 3 ) 
methods. The structure of the BChl a molecule as ob¬ 
tained from a single MD snapshot can be found in Fig. [5] 
As can be seen, the alkane tail extends away from the 
porplrine ring in this configuration, while the ring itself 
is no longer perfectly flat. 

The averaged full TDDFT and TDA spectra of the sol¬ 
vated MD snapshots as compared with the same experi¬ 
mental dataset are shown in Fig. [3b] Note that the full 
TDDFT results now show the Qy transition at 1.83 eV 
and the Qx transition at 2.10 eV. While the Qy transition 
is still overestimated by 0.2 eV, the shape of the feature 
is considerably improved, as the Qx transition is now in 
very good agreement with experimental results, both in 
its positioning and in its intensity. It is worth pointing 
out that a number of snapshots also show a shoulder to 
the main Qy transition that is of HOMO-2—>TUMO char¬ 
acter. However, in the averaged spectrum this feature 
is not as pronounced as in the experimental spectrum, 
which can be due to the fact that the splitting between 
the Qx and Qy transitions is too small compared to ex¬ 
periment. The TDA results on the other hand still fail to 
reproduce main spectral features. The Qy transition is 
overestimated by 0.5 eV, although the shoulder present 
in Fig. [3a] has disappeared. The spectrum shows a new 
peak at approximately 2.3 eV that does however have a 
different character to the Qx transition in full TDDFT. 
A clearly identifiable Qx transition is still absent from 
the TDA results. It can therefore be summarised that 
full TDDFT yields a much improved representation of 
the experimental results at the PBE level as long as a 
realistic structure for the solute and the solvent environ¬ 
ment is obtained. The main failure of full TDDFT for 
this system is the overestimation of the Qy transition by 
0.2 eV which can most likely be ascribed to errors in the 
exchange-correlation functional used. 

While the 700-800 atom systems obtained from clas¬ 
sical MD yield a relatively good spectral shape for full 
TDDFT at the PBE level, they are considerably larger 
than the 140 atoms of the solute alone. It is therefore 
worth investigating how much of the improvement of the 
spectrum from Fig. [3a] to Fig. [3b] is due to the differ¬ 
ent ionic positions of the solute and how much is due 
to an explicit quantum mechanical treatment of solvent 
molecules in the calculation. For this purpose we take 
the atomic positions of BChl a from the 8 MD snapshots 
and compute the absorption spectrum in implicit solvent, 
without including any explicit representation of toluene 
molecules. The result can be found in Fig. [3c] As can 
be seen, the positioning of the Qx transition in implicit 
solvent is in very good agreement with the experimen- 
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Number of atoms 

FIG. 4: Time taken for applying the TDDFT operator and 
calculating Vgjjp for different system sizes of bacteriochloro- 
phyll a in toluene. The lines shown are linear fits. Calcula¬ 
tions were performed on eight SandyBridge nodes containing 
16 cores each. 

tal data. However, its oscillator strength is significantly 
overestimated. Furthermore, the explicit solvent envi¬ 
ronment causes the Qy transition to red-shift by about 
0.1 eV. It can be concluded that while most of the im¬ 
provements in spectral features compared to Fig. [3a] are 
due to the more realistic atomic positions of the Bac- 
teriochlorophyll in toluene, the explicit inclusion of the 
toluene environment at the TDDFT level yields a spec¬ 
trum that is in closest agreement with experimental re¬ 
sults. 

In conclusion it can be summarised that full TDDFT 
at the PBE level reproduces experimental results to a 
satisfactory degree, while the TDA completely fails in 
this system. The best agreement between experiment 
and calculation is obtained when taking the atomic posi¬ 
tions of an MD snapshot of BChl in toluene and including 
the local solvent environment explicitly in the TDDFT 
calculation. While an explicit treatment of the solvent 
molecules requires large scale TDDFT calculations, it is 
demonstrated that the method presented in this work is 
well suited for tackling these systems, opening up the 
possibility of more detailed studies of solvent effects on 
chromophores. 

C. Linear-scaling capabilities 

We now focus on demonstrating the linear-scaling ca¬ 
pabilities of the TDDFT method presented in this work. 
For this purpose, we take one of the classical MD snap¬ 
shots studied in the previous section and compute the 
Qy transition of BChl a, including progressively larger 
regions of the solvent environment. Calculations are per¬ 
formed using the same parameters as in the previous sec¬ 
tion, apart from using a smaller radius of 10 Oo for the 
NGWF set {xa}- In order to reach linear-scaling compu¬ 
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FIG. 5: The electron-hole difference density of the Qy tran¬ 
sition of bacteriochlorophyll surrounded by a sphere of 30 A 
radius of toluene solvent molecules. The calculation includes 
454 toluene molecules, corresponding to a total system size of 
6950 atoms. The figure was created using VMD[Hg. 


tational effort with system size, it is necessary to truncate 
all involved density matrices P^, P^, P^ and P^ 
to make matrix-matrix operations scale as O(N). Here, 
we choose a tight truncation radius of 20 ao, causing the 
unoccupied and ground-state density matrices to have 
the same sparsity pattern as the NGWF representation 
overlap matrices S x and S e< . However, by performing a 
full TDDFT calculation on the 770-atom system of the 
previous section, we have confirmed that the error intro¬ 
duced to the energy of the Qy transition for such a cutoff 
is only 0.03 eV. 

While the Qy transition of interest in this system re¬ 
tains a relatively localised character and the error in¬ 
troduced through truncating P^ and P^ can be ex¬ 
pected to be relatively small, the total error of 0.03 eV is 
a combination of errors introduced by the response den¬ 
sity matrices and by the truncation of the ground state 
density matrix. The calculations show, that for relatively 
localised excitations, fully linear-scaling calculations on 
realistic systems are possible with only introducing minor 
errors. For very delocalised excitations, the truncation 
of the response density matrix becomes more difficult 
and the effects of applying a truncation on long-range 
charge-transfer excitations has been discussed elsewhere 
in more detailpOj. Here we note, that for a large class 
of systems like pigment-protein complexes, excitations 
of interest are expected to retain a relatively localised 
character and fully linear-scaling calculations are indeed 
possible. 

We choose to perform a linear-scaling test of the full 
TDDFT method on four different system sizes of Bchl 
in Toluene, each specified by the radius, as measured 
from the Mg atom, up to which solvent molecules are in¬ 
cluded in the calculation. The four different radii chosen 
are 15, 20, 24 and 30 A, corresponding to system sizes 
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of 770, 1925, 3455 and 6950 atoms. The timings, both 
for evaluating the self-consistent field response VgJ| F and 
applying the full TDDFT operator (19) are shown in 
Fig- El In general, a clear linear trend can be observed 
for both the calculation of VgQ F and the TDDFT opera¬ 
tor, with slight discrepancies for the two smaller systems. 
These discrepancies are likely originating from the fact 
that matrices in the two smallest systems are still rela¬ 
tively dense and a full transition to linear scaling effort 
only occurs at larger system sizes. However, it becomes 
clear that the algorithm is capable of solving the full 
TDDFT eigenvalue equation for systems of thousands of 
atoms in linear-scaling effort. Furthermore, recent work 
implementing hybrid OpenMP-MPI approaches to par¬ 
allelism mean that these calculations scale efficiently to 
many thousands of CPU cores [51] . 

Figure [5] shows a plot of the electron-hole difference 
density for the fully converged Qy transition of the 6950 
atom system. It should be noted that these system sizes 
are inaccessible by standard 0(N 3 ) approaches, both at 
the DFT and the TDDFT level. The calculation pre¬ 
sented here is to be seen for demonstration purposes only, 
given that the Qy transition retains a relatively localised 
character and a fully quantum mechanical treatment of 
such a large region of the solvent environment can be con¬ 
sidered unnecessary. However, it should be pointed out 
that the system treated here is of similar size as a variety 
of pigment-protein complexes, most notably the Fenna- 
Matthews-Olson complex, where seven BChl molecules 
are embedded in a complex protein environment and site 
energy variations due to environmental screening effects 
are both subtle and important [Tj-|3] . The method pre¬ 
sented in this work has the potential of treating these 
systems fully quantum mechanically, without relying on 
semi-classical approximations. 


the compact NGWF representation and shown that it 
yields results comparable to those obtained with well- 
converged Gaussian basis sets. Furthermore, the vital 
importance of preconditioning the iterative solution to 
the TDDFT equation has been demonstrated, yielding a 
four-fold speedup of the convergence rate in the case of 
trans-azobenzene. 

We have furthermore shown that the TDA fails to re¬ 
produce experimental results of BChl a in toluene solu¬ 
tion at the PBE level, while full TDDFT yields a signif¬ 
icant improvement of all spectral features. It was also 
shown that the best results compared to experimental 
data are achieved when treating a certain part of the 
solvent explicitly at TDDFT level, making it necessary 
to perform TDDFT calculations of several hundreds of 
atoms. While these system sizes can reach the the limits 
of standard 0(N 3 ) methods, they are straightforwardly 
treated with the method introduced here, opening up 
the possibility of more detailed studies on the effects 
of pigment-solvent interactions on excitation energies of 
chromophores. 

Finally, we have shown that the algorithm scales fully 
linearly with system size as long as all involved density 
matrices are truncated. The largest full TDDFT calcu¬ 
lation performed in this work treats a system of 6950 
atoms, far larger than systems that can be realistically 
addressed with cubic-scaling approaches. These large- 
scale systems are of the same order of magnitude as a 
large variety of pigment-protein complexes that are stud¬ 
ied in the held of computational biology, opening up the 
possibility of computing their excitation spectra without 
the need to rely on any semi-classical approximations. 
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